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Uniform error bounds for a continuous 
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In this work, we deal with approximations for distribution functions of non-negative random 
variables. More specifically, we construct continuous approximants using an acceleration tech- 
nique over a well-know inversion formula for Laplace transforms. We give uniform error bounds 
using a representation of these approximations in terms of gamma-type operators. We apply 
our results to certain mixtures of Erlang distributions which contain the class of continuous 
phase-type distributions. 
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1. Introduction 

Frequent operations in probability such as convolution or random summation of random 
variables produce probability distributions which are difficult to evaluate in an explicit 
way. In these cases, one needs to use numerical evaluation methods. For instance, one can 
use numerical inversion of the Laplace or Fourier transform of the distribution at hand 
(see [2] for the general use of Laplace-Stieltjes transforms in applied probability or [9, 11] 
for the method of Fast Fourier Transform in the context of risk theory) . Another approach 
is the use of recursive evaluation methods, of special interest for random sums (see [11, 18], 
for instance). Some of the methods mentioned above require a previous discretization 
step to be applied to the initial random variables when these are continuous. The usual 
way to do so is by means of rounding methods. However, it is not always possible to 
evaluate the distribution of the rounded random variable in an explicit way and it is 
not always clear when using these methods how the rounding error propagates when one 
takes successive convolutions. In these cases, it seems worthwhile to consider alternative 
discretization methods. For instance, when dealing with non-negative random variables, 
the following method ([10], page 233) has been proposed in the literature. Let X be a 
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random variable taking values in [0,oo) with distribution function F. Denote by 4>x{') 
the Laplace-Stieltjes (LS) transform of X, that is, 



(t) := Ec- tx = f e- tu dF(u), t>0. 

J[0,oo) 



For each t > 0, we define a random variable X mt taking values on k/t,k <G N, and such 
that 

P(X' t = k/t) = ^$\t), keN, (1) 



where ^x denotes the fcth derivative (</>x^ = 4>x)- 



M 
)) E 

Thus, if we denote by L\F the distribution function of X**, we have 

L* t F(x) := P(X** < x) = £ (t), x > 0, (2) 



fc! 

fe=0 

where [x] indicates the largest integer less than or equal to x. The use of this method 
allows one to obtain the probability mass function in an explicit way in some situations 
in which rounding methods could not (see, for instance, [4] for gamma distributions). 
Moreover, this method allows for an easy representation of L* t F in terms of F, which 
makes possible the study of rates of convergence in the approximation ([4, 5]). In [4], the 
problem was studied in a general setting, whereas in [5], a detailed analysis was carried 
out for the case of gamma distributions, that is, distributions whose density function is 
given by 

a P x p-l c -ax 

fa, P {x):= , x>0. (3) 

Also, in [16], error bounds for random sums of mixtures of gamma distributions were 
obtained, uniformly controlled on the parameters of the random summation index. In all 
of these papers, the measure of distance considered was the Kolmogorov (or sup-norm) 
distance. More specifically, for a given real- valued function / defined on [0, oo), we denote 
by ll/H the sup-norm, that is, 

H/ll := sup 1/(201. 

x>0 

It was shown in [5] that for gamma distributions with shape parameter p > 1 , we have 
that — F|| is of order 1/t, the length of the discretization interval. Note that 

Ij-LjF — F|| is the Kolmogorov distance between X and X %t , as both are non-negative 
random variables. 

The aim of this paper is twofold. First, we will consider a continuous modification of 
(2) and give conditions under which this continuous modification has rate of convergence 
of 1/t 2 instead of 1/t (see Sections 2 and 3). In Section 4, we will consider the case 
of gamma distributions to see that the error bounds are also uniform on the shape 
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parameter. Finally, in Section 5, we will consider the application of the results in Section 
4 to the class of mixtures of Erlang distributions, recently studied in [19]. This class 
contains many of the distributions used in applied probability (in particular, phase-type 
distributions) and is closed under important operations such as mixtures, convolution 
and compounding. 



2. The approximation procedure 

The representation of L* t F in (2) in terms of a Gamma process (see [4]) will play an 
important role in our proofs. We recall this representation. Let (S(u),u > 0) be a gamma 
process, in which S(0) = and such that for u > 0, each S(u) has a gamma density with 
parameters a = 1 and p = u, as given in (3). Let g be a function defined on [0, oo). Wc 
consider the gamma-type operator L t given by 

L t g(x):=Eg(^f^j, x>0,t>0, (4) 

provided that this operator is well defined, that is, L t \g\(x) < oo, x > 0, t > 0. Then, for 
F continuous on (0,oo), L* t F in (2) can be written as (see [4], page 228) 

w . v (ati). sr (»3!a±2), *>o, 4 >o. ( 5) 

It can be seen that the rates of convergence of L t g to g are, at most, of order 1/t (see 
(40) below). Our aim now is to get faster rates of convergence. To this end, we will con- 
sider the following operator, built using a classical acceleration technique (Richardson's 
extrapolation - see, for instance, [9, 11]): 

Lf ] g(x) := 2L 2t g(x) L t g(x) = 2Eg (^|^) - Eg , x > 0. (6) 

We will obtain a rate of uniform convergence from L^g to g, of order 1/t 2 , on the 
following class of functions: 

V:={geC A ([0,oo)): \\x 2 g lv (x)\\ < oo}. (7) 

The problem with L t g is that when tx is not a natural number, L t g(x) is given in terms 
of Weyl fractional derivatives of the Laplace transform (see [6], page 92) and, in general, 

[21 

we are not able to compute them in an explicit way. However, if wc modify L t g using 
linear interpolation, that is, 

Ml%(x) := (tx - [tx]) ) + ([tx] + 1 - tx) (4 2] 9 (M) ) , (8) 



564 C. Sangiiesa 

then we observe that the order of convergence of M^g to g is also 1/t 2 on the following 
class of functions: 

2? i: ={.geC 4 ([0, TO )): \\g"(x)\\ < oo and \\x 2 g iv (x)\\ < 00} . (9) 

Moreover, the advantage of using AL} 'g instead of L\ 'g to approximate g is the com- 
putability. In the following result, we note that the last approximation applied to a 
distribution function F is related to L* t F, as defined in (2). From now on, N* will denote 
the set N\{0}. 

Proposition 2.1. Let X be a non-negative random variable with Laplace transform 4>x- 

[21 

Let L*[F,t > 0, be as defined in (2) and let M t F be as defined in (8). We have 

F(0), ifk = Q, 

k- 

~2t 



m1 2] f( x ) = ( te - it x })MM F (W±±) + (N + 1 tx)M m F (M j . 



t J ' 1 V t 

Proof. Let t > be fixed. First, observe that by (8), we can write 



MpFr-\=L?F[- t ], keK (12) 

Now, using (6) and (4), we have M t [2] F(0) = Lf ] F(Q) = F(Q), which shows (10) for k = 0. 
Finally, using (6), (4) and (5), we have, for k £ N*, 



(13) 



Thus, (12) and (13) show (10) for fc e N*. Note that (11) is obvious by (8) and (12). This 
completes the proof of Proposition 2.1. □ 

In the following example, we illustrate the use of the previous approximant in the 
context of random sums, defined in the following way. Let (Xi)j g pj* be a sequence of 
independent, identically distributed non- negative random variables. Let M be a random 
variable concentrated on the non-negative integers, independent of (Xi)i g N*. Consider 
the random variable 

A I 



with the convention that the empty sum is 0. 
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Example 2.1. As pointed out in the Introduction, an explicit expression for the dis- 
tribution of (14) is usually not possible. Our aim is to consider an example in which 
this distribution can be evaluated explicitly and to compare our approximation method 
with some others considered in the literature. To this end, we consider that M follows a 
geometric distribution of parameter p, that is, P(M = k) = (l —p) k p, k £ N and (Xj)j S N* 
are exponentially distributed (with mean 1, for the sake of simplicity). In this case, it 
is well known (use LS transforms, for instance) that (14) has the same distribution as 
a mixture of the degenerate distribution at (with probability p) and an exponential 
distribution, that is, 

F{x)~p{^yCi<x\ =p+(l-p)(l-c- px ) = l-(l-p)e- px , x>0. (15) 

When an explicit expression is not possible, the usual approximate evaluation method 
is by discretizing the summands in (14) and then using recursive methods found in the 
literature for discrete random sums. By considering (1) as a first discretization method, 
we have (see [5], page 391) 

k\ ( t \ k 1 



P \ x ' t= t) = \T+i) t + i' * = ' 1 "- (16) 

Thus, £Xa=i is a geometric sum of geometric distributions with parameter r = (1 + 
t) _1 . It is easy to check (use LS transforms, for instance) that the distribution of such a 
random variable is a mixture of the degenerate distribution at (with probability p) and 
a geometric distribution with parameter p* = 1 — (1 — r)(l — (1 — p)r) _1 = 1 — t(t +p)~ 1 , 
so that for each k £ N, 



'I - 



P (Jx-.a) 



-P+(l-p)(l-(l-p*) k+i )=l-(l-p) 



t 



t+P 



fc+1 



(17) 



Note that the first equality in (17) follows by recalling (2) and noting that (^j=i -^i)*' has 
the same distribution as 2j=i-^7* ( see [16], Proposition 2.1). Actually, a more natural 
way (in this case) to compute (17) is to evaluate the LS transform of (X^=i^i)*' an< l 
then apply (1) and (2). However, the previous computations enable easier comparisons 
with the following method. In fact, one of the most obvious (and widely used) methods 
to discretize the summands in (14) is by a rounding method. For instance, a rounding 
down method (we round Xi to \tXi\t~ 1 ) yields 

P(^l = \ ) = < X X < *±!) = e-*/«(i - k £ N. (18) 
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X - 


k 
5 


F(l) 






W 5 F(f) 


= 


_ 
5 


0.1000 


0.1176 


0.1195 


0.1000 


1 = 


5 
5 


0.1856 


0.2008 


0.2108 


0.1848 


5 = 


25 
5 


0.4541 


0.4622 


0.4907 


0.4412 


10 


_ 50 

— 5 


0.6689 


0.6722 


0.7054 


0.6383 


15 


_ T5 

5 


0.7992 


0.8002 


0.8296 


0.7576 


20 


_ 100 

5 


0.8782 


0.8782 


0.9014 


0.8327 


30 


_ 150 

5 


0.9552 


0.9548 


0.9670 


0.9142 


40 


_ 200 


0.9835 


0.9832 


0.9890 


0.9524 



5 



In this case, \tXj] is a geometric sum of geometric distributions with parameter 



r' = 1 — e 1 / t . We denote by RtF the distribution function of Y^iLi ^r^- Using the same 
arguments as for (17), we obtain for each k EN that 



Finally, it would be interesting to compare the previous 'discretization methods' with a 
'transform method.' To this end, we consider the Laplace transform of F in (15) (instead 
of its LS transform), that is, 

w F {6)= e- 9u F(u)du= e >0, 
Jo 0+p 

and apply the Post-Widder inversion formula (see [10], page 233), defined for t <G N* as 



In Table 1 (computations with MATLAB) we consider a 'rough' discretization interval 
(t = 5), a small p (p = 0.1) and present, for different x = fc/5, the exact values of F 
(column 2), the L* t approximation (column 3), the 'rounding down' discretization (column 
4) and the Post-Widder inversion (column 5). 

As we can see in Table 1, L\F provides a better approximation than R^F. The intuitive 
explanation of this fact is that, when approximating Xi by X)i=i > ^ ne error in 

the approximation can be controlled 'uniformly' regardless of the distribution of M (see 
[16], Theorem 4.3). This effect is obvious when we choose M with a large expected value 
(our choice of a small p is for this reason - for larger values of p checked, L^F is also 
better, but the difference is less appreciable). However, if we compare the approximations 
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Table 2. Comparison of Mf' in (10) with Gf in (21) 



X - 


fe 

5 




L%F{^) 




Mf-F(f) 


GfF(f) 


1 = 


_ 5 
5 


0.1856 


0.1848 


0.1852 


0.1856 


0.1856 


5 = 


25 
5 


0.4541 


0.4514 


0.4528 


0.4541 


0.4538 


10 


_ 50 

5 


0.6689 


0.6656 


0.6673 


0.6689 


0.6677 


15 


_ 75 

5 


0.7992 


0.7962 


0.7977 


0.7992 


0.7975 


20 


_ 100 

5 


0.8782 


0.8758 


0.8770 


0.8782 


0.8766 


30 


150 
— 5 


0.9552 


0.9538 


0.9545 


0.9552 


0.9553 


40 


200 
— 5 


0.9835 


0.9829 


0.9832 


0.9835 


0.9854 



L^F and W$F, we see that the last one is better for small values, whereas the first one is 
better for large values. To explain this fact, it is interesting to point out that WtF, like 
L\ F, admits the following well-known representation. For a function g defined on [0, oo), 
we can write, as in (5) (see [10], pages 220, 223), 

Wtg{x)=Eg(x^j, x>0. (20) 

Note that the mean of the 'random points' defining Wt in (20) is E(xt^ 1 S(t)) = x, 
whereas for L* t in (5), we have E{t~ l S{\tx] + 1)) = ^([tx] + 1). This means that W t 
is centered at x, whereas L* t is 'biased'. The benefits of this property for Wt are ob- 
served at small values in Table 1. However, we have Vai(xt~ 1 S(t)) =i _1 a; 2 , whereas 
Var(t _1 S'([te] + 1)) = t~ 2 ([tx] + 1), the latter being of order t~ 1 x 1 as t — > oo. The greater 
variability of the random variables defining Wt for greater values of x produces an un- 
desired effect in the approximation. 

We now show the improvement in the approximation which occurs when using M t , 
as defined in (10), instead of L* t . In Table 2 below (t = 5), we compare M\ 2] F (column 
5) with Richardson extrapolation for WtF (or Stehfest enhancement of order two for the 
Post-Widder formula - see [1], page 40), that is, 

Gf ] F(x) :=2W 2 tF(x)-W t F(x), x>0. (21) 

As we can see, M§ F provides us with an exact value up to a four decimal places, whereas 

[2l 

G l b l F does not achieve this accuracy. 
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3. Error bounds for the approximation 



Let g €T>, as defined in (7). Our first aim is to give bounds of \\L\ g — g\\ in terms of 
||x 2 <7 TO (2;)||. To this end, we will use the following as 'test function': 

if x = 0, 

— log(x) I , otherwise. 




(22) 



Observe that 4> € V. In fact, by elementary calculus, 



6'{x)=x(l-\ogx), 6"(x) = -\ogx, 6"'(x)=-~ and S iv (x) = \. (23) 

X x 4 

In the next lemma, we make an explicit computation of L t (f>(x) in terms of the ^ (or 
digamma) function. This function is defined as (see [3], page 258) 



*(*):= ^logfrWJ^pog, 



ue-V'du, x>0, (24) 

and, therefore, using the last equality, we have the following probabilistic expression of 
the psi function in terms of the gamma process: 

V(x)= E\ogS(x), x>0. (25) 

We will use the following property of this function (sec [3], page 258): 

^f(x + l) = -+^(x). (26) 

x 

Lemma 3.1. Let (j> be defined as in (22) and let L t ,t > 0, be defined as in (4). We have 
that 

L t 0(x) = ^r^|^-y-l + te(te + l)(-*(te)+log(t))) > x>0. (27) 

Proof. Let t > and x > be fixed. First, using elementary calculus, (4) and (26), we 
can write 

L^) =£ ;_^^--log 

° .2/^3 l (u\ , 



2t 2 T{tx)J \2 °\t r 



2t 2 T(tx + 2)J \2 b \t 

(tx)(tx + 1) /3 „, fS(tx + 2) 
- E log 1 



2i 2 V2 t 
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Therefore, using (25), we can write 

(tx)(tx + l) /3 



LfMx) 



Now, using (26) twice, we have 



2t 2 



*(te + 2) + log(<) 



2(tx) + l . . 
V(tx + 2) = — + &(tx) 
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(29) 



(30) 



By (29), (30), we obtain 



L t (j>{x) 



(ix)(tx + l)/3 2(tx) + l 



2f 2 



2 tx(tx + 1) 



-*(te)+log(<) 



The result follows using elementary algebra in the expression above. 



□ 



In the next lemma, we will study the approximation properties of L t <j) to 4>. We will 
make use of the following inequalities for the psi function: 



< log(x) - $(x) < i, x > 0; 



log(x) - \ff(x) 



1 1 

< 



2x ~ 12x 2 ' 



x>0. 



(31) 
(32) 



We can find (31) in [7], page 374, whereas (32) is an immediate consequence of the fact 
that the function 

is completely monotonic (see [15], page 304) and thus non- negative. 
Lemma 3.2. Let <f> be as defined in (22) and let L t ,t > 0, be as defined in (4). We have 

3 



L t Mx) 



x log X 1 
2t + 3*2 



< 



8t 2 ' 



Proof. Let x > and t > be fixed. First of all, we can write 



On the other hand, 



x log a; 1 1 
2t + it 2 = 2*2 



(ix) 2 log(ix) + (ta) 2 log(i) 



(te) log te — (te) log t + 



(33) 



(34) 



(35) 
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Therefore, using Lemma 3.1, (34) and (35), we can write 
x log X 1 



L t (j>(x) - 4>(x) + 

1 - {txfm(tx) - (tx)^(tx) + (tx) 2 log(te) + {tx) log(ta) + ^ I (36) 



2t 3< 2 

1 / tx 

= ^ (( te ) 2 ( l0 S( te ) - - 2(b) ) + te(l0g(te) - - I 

By (31), we have that 1/2 < a;(log(x) - *(x)) < l,x > 0, and thus 

§<te(log(te)-tt(te))-§<§. (37) 
Thus, using (36), (37) and (32), we obtain (33). □ 
We are now in a position to state the following. 

rn] 

Theorem 3.1. Let g & as defined in (7) and let L\, t> 0, be as defined in (6). We 
have 

\L [2] g(x) - g(x)\ < ^\\xg'"{x)\\ + JL\\x 2 g m {x)\\. 

Proof. We will first see that g GT> implies that 

\\xg'"{x)\\ < \\x 2 g lv (x)\\ <oo. (38) 

To begin with, the fact that ||x 2 (7™(a;)|j < oo implies that lmx^oo x l+a g m (x) = for all 
< a < 1. By L'Hopital's rule, we also have that lmx^oo x a g"'(x) = 0, thus concluding 
that lim-r^oo g'" (x) ~0. Using this fact, we can write 

/•OO 

g"'{x) = - g m (u)du, 

J X 

which implies easily (38) as 

\xg"'(x)\<x r '^^' du^llxV-WH. 

J X u 

Now, let t > and let L t be as in (4). As a previous step, we will prove that 



L t g{x)-g(x) 



xg"(x) xg"'(x) 



2t it 2 



<^\\x 2 9 iv (x)\\, x>0. (39) 



To this end, let x > 0. Using a Taylor series expansion of the random point u = S(tx)/t 
around x and taking into account that E(S(x) — x) = 0, E(S(x) — x) 2 = x and E(S(x) — 
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x) 3 = 2x, we can write 

L t g{x) - g(x) = Eg 



E(S(tx)-tx) 2 „ E(S(tx)-txf 



2t 2 



6t 3 



</"(*) 



1 -e 

6 



S(tx)/t 



g™(0) 



S(tx) 



t 



xg"(x) xg"'{x) 1 f s(tx)/t 



E 

2t it 2 6 

Then, using (40), we get the bound 

xg"(x) xg"'(x) 



«T(0) 



S(tx) 



-6\ de. 



L t g{x)- g(x) 



21 



E 



S(tx)/t 



M 2 
S(tx) 



\\x 2 g vv {x)\\ r^^,S(tx)/t) 
< E I 

6 Jmin(x,S{tx)/t) 

l|sV (s)ll c , f S{tx)/t fS(tx) 



ej de 

S(tx) 



t 



¥ d9 



-E 



V t 



' 1_ 



de. 



Let 4>(-) be as in (22). Using (40) and (23), we have 



r , , \ , , ^ £ log £ 1 1 

Then, by (41) and (42), we can write 
xg"(x) xg"'(x) 



s[tx)/t ( S(tx) 
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(40) 



(41) 



L t g{x) - g{x) 



2t 



3t 2 



< \\x 2 g m (x 



LMx) 



x log x 1 



(42) 



2t ?>t 2 



Thus, (39) follows by applying Lemma 3.2. 

Observe that in (39), the only term of order 1/t is the one involving the second deriva- 
tive. By means of the operator as defined in (6), this term is eliminated. In fact, 
using (39), we have 



f2l 

L t g( x ) - g( x ) = 2 { L 2t9{x) - g{x)) - (L t g(x) - g(x)) 

L 2t g{x)-g{x)-j t g"{x)-^g'" 
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(43) 

L t g(x) - g(x) - ^g"(x) - ^ 9 '"(x)) - ^g'"(x) 
<±\\x9'"(x)\\ + ^\W(x)\\. 
This completes the proof of Theorem 3.1. □ 

[21 

Finally, in the next result, we study the approximation properties of M\ . 

[2] 

Theorem 3.2. Let g e T>\, as defined in (9) and let M t ,t > 0, be as defined in (8). We 
have 

\\Ml 2] g-g\\ < ±\\g»(x)\\+±\\xg"\x)\\ + ^\\x 2 g™(x)\\. 

Proof. First, note that g 6 T>\ implies that ||xg"'(x)|| < oo, thanks to (38). Now, let 
t > and x > be fixed. We write 



Ml 2] g(x)-g{x) = {tx-[tx])f y L [ S{^ 



tx] + 1 \ / [tx] + 1 



HM + l-tx)(LfUM]-g(M 



t 



( [te] j 



(44) 



Using the usual expansion 



\g(y) - g{x) -{y- x) g '{x)\ < ^-^\\g"\\ (45) 



and taking into account that 



N) ( 9 [ M±±) - g(x)) + ([tx] +l-tx)(g( [ ^)- g(x) 



t 



< [ 9{x) ~ N + l tx g , 



t 



At^\_ g{x) _M_tx gl{x) 



t 
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we obtain from the above expression and (45) that 



( [ ^ 1 tl ) g(x)) + (N + 1 - tx) U [ ^) 



< {tx- [tx]) 



([tx] + l-txY 
2t 2 



+ {[tx] + l-tx) 



([tx] -tx)' 
2t 2 



(47) 



{tx - [tx]){[tx] + l-tx) „ 1 „ 
= ^ 1|0 ll<^ll<7 II, 

the last inequality holding since for each k £ N, the supremum of (u — k){k + 1 — u), k < 
u < k + 1, is attained at u = k + 1/2. On the other hand, taking into account Theorem 
3.1, we have 



(**-N)(4VM±i 



t 



[tx] + 1 



tx] 



<\\L? ] g-g\\<—\\xg">{x)\\ 
The result follows by (44), (47) and (48). 



1 



16i 2 



x z 9 m {x)\\. 



(48) 



□ 



4. Application to gamma distributions 

In this section, we will study the case of gamma distributions, that is, distributions with 
density functions as given in (3). It is not hard to sec that these distributions arc in the 
class 2?i, for a shape parameter p = 1 or p > 2, and, therefore, we are a position of apply 
Theorem 3.2. The aim of this section is to show that, in fact, the bounds in this theorem 
can be uniformly bounded on the shape parameter, which will be an advantage when 
dealing with mixtures of these distributions. From now on, we define 



x>0, if peK\{0,-l,-2,...}, 



f P {x):={ T{p) ' 

0, x>0, if pe {0,-1,-2,...}. 



(49) 



The 'odd' definition of f p for p e {0, —1, —2, . . .} is for notational convenience in (51). For 
p > 0, the function above is the density of a gamma random variable as in (3), with scale 
parameter a — 1 . Results for another scale parameter will follow by a change of scale 
(see Proposition 5.1 below). First, we will consider the case p= 1, that is, an exponential 
random variable. As the distribution function of this random variable presents no compu- 
tational problems, it makes no sense to approximate it. However, when we consider the 
problem of approximating a general mixture of Gamma distributions, the exponential 
distribution could be a component. 
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Lemma 4.1. Let F(x) = 1 - c- x 7 x > 0. For t > 0, let Al\ 2] F be as defined in (8). We 
have that 



* 11 " 1 ° 6c 4c 2 t 2 



Proof. First of all, note that |F( fc )(x)| = e x and that sup x>0 x k e x = k k e k ,k = 
1,2, Thus, we have 

||F"||=1, ^'"(aOl^e- 1 and \\x 2 F lv (x)\\ = 2 2 e" 2 . (50) 

The conclusion follows by taking into account Theorem 3.2. □ 

We will now deal with the case p > 2 in (49). The two following lemmas will be 
useful in order to bound the derivatives of this density. For the sake of brevity, they 
are stated without proof (only elementary calculus is required). For the proofs, we refer 
the interested reader to [17], a. preliminary version of this paper (available online). 

Lemma 4.2. Let f p (-), p > 0, be as defined in (49). We have, for all ngN, 

(51) 



£T )(-iyf p - n+i (x), x >o, 



which nLi(p-J') = !• 



Next, we formulate a technical lemma in which we define certain decreasing functions 
which will be used to bound the weighted derivatives of f p . 



Lemma 4.3. We have: 
(i) the function 



9i(p)~^e-^(p-ir-\ P >1 (52) 



(,9i (1) = 1), is decreasing in p; 
(ii) the function 

»(P) : = ^c^- 1 ^ 1 / 2 ^ (p-\ + l^p-sj^ 2 , p>l, (53) 



is decreasing in p; 
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(iii) the function 

^p)^f^f~ {p ^^ 1} (V^i~ir 2 (V^r\ p>i (54) 

(53(2) = 1) , is decreasing in p; 

(iv) the function 

9*(p)--=Y^f- {p - V ^\p-V3p~2r 2 (V3p~^-l)\ P>2 (55) 

(54(2) = 1), is decreasing in p. 

In the following result, we get bounds of the quantities required in Theorem 3.2, de- 
pending on the shape parameter p, but also decreasing on p. 

Lemma 4.4. Let f p be as in (49) and g%,i= 1,2,3,4, be as in Lemma 4.3. We have: 

(i) sup I f p (x) I = c/i (p), p>l; 

x>0 

(ii) sup\xf'(x)\=g 2 (p), p>l; 

ir>0 

(hi) sup\f'(x)\=g 3 (p), p>2; 

x>0 

(iv) sup|a;/''(a;)| <max{5i(p-l),5f 2 (p-l)}, P>2; 

x>0 

(v) suplx 2 /;"^)! < 5 4(p) + 3 52 (p-l)+5i(p-l), P>2. 



Proof. To show part (i), it is clear that, for p > 1, 

e-Cp- 1 )^- i)p-i 
sup/ p (a;) = f p (p - 1) = — r 

x>o r(p) 

and (i) follows by recalling (52). To show part (ii), we have (see [16], Remark 3.2 and 
Lemma 5.2) 

1/11 ^~ 1/2 



x>0 



r(p) V" 2 2 



sup|x/;(x)| = — - U-- + -V4P - 3 e-P-i/2+1/2^3^ p>lj (56) 



and (ii) follows by recalling (53). To show part (iii), by (51), we have for p > 2 that 

f p (x) = Y ±-re- x xV- 2 (p-l-x), x>0, (57) 

f^x) = f ±- ) c-*xP- 3 ((p-l)(p-2)-2(p-l)x + x 2 ), x>0, (58) 
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and it can be easily checked that the zeros of f p (x) are p\ := p — 1 — %Jp — 1 and p2 := 
p — 1 + y 7 ^ — 1. Therefore, |/p(x)| must attain its maximum value at either p\ or pi. 
Actually, p\ corresponds to the maximum. To show that, we will see that 

W- =e ^f^Zl- i r a >l > P >2. (59) 



|/> 2 )| VVP^T+l 
To show the last inequality in (59), taking logarithms, we will prove that 



r 1 ( P ):=2^p~T+(p-2)(log(^p~T-l)-\og(^p~T+l))>0 1 p>2. (60) 
Define 



Pi(b) := + (log(6 - 1) - log(6 + 1)), b > 1. 



Note that 



ri(p) = (p-2)p 1 (Vp^T), P>2- (61) 

We will first prove that 

Pi(b)>0, b>l. (62) 

To show (62), it is readily seen that p[(b) = — 4(& 2 — l)~ 2 ,b > 1, so that p\ is decreasing. 
As limb^oo pi(&) =0, we have (62). This implies also (60), recalling (61). Therefore, we 
conclude that 

sup|/;(x)| = /» = J-^-^-^V^T - \r-\Jp—\f-\ (63) 

which, together with (54), shows (iii). 

To show part (iv), note that by using (51), we can write f' p (x) = f p -i(x) — f p (x) and, 
therefore, 

xf' p \x) = xf p _ 1 {x)-xf' p {x), x>0,p>2. (64) 

On the other hand, we see in (58) that f p _i(x) and f p (x) have the same sign for < x < 
p — 2 and p — 1 < x < oo and, therefore, using part (ii) and Lemma 4.3(i), we can write 

sup \xf p \x)\<max(g 2 (p-l),g2(p)) = g2(p-l)- (65) 

xf[p— 2,p— 1] 

On the other hand, we have, by (58), 

xf p \x) = J-^c-^- 2 ((p - l)(p - 2) - 2(p - l)x + x 2 ). (66) 

Using the above expression and taking into account that, for p — 2 < x <p — 1, 

e -^(p-2) < e - p - 2 (p-2) p - 2 and \(p-l)(p-2)-2(p-l)x + x 2 \=p-l, (67) 
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the last inequality holds as \(p — l)(p — 2) — 2(p — l)x + x 2 \,p — 2 < x < p — 1, attains its 
maximum value at p— 1. From (66) and (67), we conclude that 

sup \xf;(x)\<-±-e-^){p-2y- 2 {p-l)=g 1 (p-l), (68) 

x&[p-2,p-X) 1 IP J 

where the last inequality follows by recalling (52). Thus, (65) and (68) conclude the proof 
of part (iv). To show part (v), let p > 2. First, we have, by (51), 

f p "(x) = / p _ 3 (a:) - 3f p - 2 (x) +3/ p _i(x) - f p (x) 



e "x- 



t(p) 



t(p) 

Therefore, if we call 

6 x ^ 



((p - l)(p - 2)(p - 3) - 3(p - l)(p - 2)x + 3(p - l)x 2 - x 3 ) (69) 
((p - 1 - x) 3 + 3(p - - (p - 2)) - (p - 1)), x > 0. 



(70) 



ftp (a) := " ~ . (p - 1 - x) 3 , x>0, 

we have, recalling (57), 

= £ r|r (b ~ 1 ~ x)3 ~ 3{p ~ l)[x - {p - 2)) - {p 1)} 

= ftp(x) + 3x/ p _ 1 (x)-/ p _i(x), x>0. 
We will firstly see that 

sup | h p (x) | =g 4 (p) (71) 

x>0 

with <74(-) as defined in (55). Note that 

K(x) = 6 3 ( P - 1 - x) 2 (x 2 - 2px + (p - l)(p - 2)), x > 0. 

The maximum value of \h p \ will be attained at the roots of the last polynomials, being 
Pi : = p + \/3p — 2 and p2 := P — \/3p — 2. To check which value attains the maximum, 
define u := ^3p- 2. Note that pi = (w, + l)(u + 2)/3 and p 2 = (u- l){u - 2)/3. Then, 
with this notation, we will prove that 



\h P (p2)\ , u f(u-l)(u-2)\ {u2 ' 4)/3 /u 1 



ftp On) | V (w + i)(u + 2)7 V^ + i 



3 

>1, u>2. (72) 
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To show the last inequality in (72), taking logarithms, we will show that 

, , u 2 -4, f (u - l)(u - 2)\ , fu-l\ , . 

pa(u):=2u+— r— log ) (1 +3 io g — - >0, u>2. (73) 



(u + l)(ti + 2)/ + l 



Note that 

2u f (u-l)(u-2)\ u 2 -4 



p' 2 (u) = 2 +—\og[ 



3 



(u+l)(u + 2)y 3 V"- 1 "-2 " + 1 u + 2 
1 1 



- 1 u + 1 
4m 2 2u /(u-l)(it-2) 



m 2 -1 3 t3 \(u + l)(u + 2) 

We will show that p' 2 (u) < 0, u > 2. In fact, 

d 3 , . . 36 
dit 2u [u + \y yu — iy (u z — by 

and then 3(2u) _1 /92(u) is increasing. As \ : mi u ^ too 'i{2u)^ 1 p' 2 (u) =0, we conclude that 
3(2u) -1 x p' 2 (u) < and thus that p' 2 (u) < 0. Therefore, P2(u) is decreasing. This, 
together with the fact that lim tl _>. c>0 pi (u) = 0, proves (73) and therefore (72). Then, 
\\hp\\ = h p (p2) = g^ip), thus proving (71). The proof of part (iv) now follows easily by 
recalling (70) and using (71) and parts (i) and (ii). □ 

As an immediate consequence of Theorem 3.2 and Lemma 4.4, we have the following 
corollary. 

Corollary 4.1. Let F p be a gamma distribution with shape parameter p>2, that is, 

[2] 

whose density function is given by (49). Let Mf l ,t>0, be defined as in (8). We have 

\\M™F Fll<f 17 + 27 V ~ 2 -° 375 
\\M t F p -F p \\<{- + —)^~—g-. 

Proof. Let p > 2 be fixed. The result is an immediate consequence of Theorem 3.2, as 
F^ = f p , as defined in (49). Therefore, by Lemma 4.4(iii) and Lemma 4.3(h), we have 
that 

||F;|| = ||/;||= 53 (p)<53(2) = 1. (74) 
On the other hand, we see that by Lemma 4.3(i), we have that 

ffi(p-l)<fli(l) = l and g 2 (p-l)<g 2 (l) = e-\ p>2. (75) 

Thus, using the above inequalities and Lemma 4.4(iv), we have 

\\xF;"(x)\\=\\xf^x)\\<l. (76) 
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Finally by Lemma 4.4(v), Lemma 4.3(iv) and (75), we have 

\\x 2 F; v (x)\\ = \\x 2 f^(x)\\<g i (2) + 3g 2 (l) + g 1 (l) = 2 + 3e- 1 . (77) 

Using (74), (76), (77) and Theorem 3.2, we obtain the result. This completes the proof 
of Corollary 4.1. □ 

5. Applications to mixtures of Erlang distributions 
and phase-type distributions 

In this section we apply the results from the previous section to mixtures of Erlang 
distributions and to random sums of thereof. In order to undertake this study for an 
arbitrary scale parameter, we need the following result which shows the behavior of 

[21 

M± 'F under changes of scale. 

Proposition 5.1. Let X be a random variable with distribution function F. For a given 

[21 [21 

c > 0, denote by F c the distribution function of cX . Let M\ F and M\ F c , t>0, be the 
respective approximations for F and F c , as defined in (8). We have that 

Ml 2] F c (x) = M% ] F(x/c), x>0. (78) 

Therefore, 

||A/ t [2] F c -F c || = \\M l £ ] F-F\\. (79) 
Proof. Let t > and c > be fixed. First, we will see that 

m? ] f*(^=m$f(^, ken, (80) 

and, therefore, (78) is satisfied for points in the set k/t,k £ N. To this end, we use (12) 
and (6), and take into account that 



F c {x) =F(x/c), x>0, (81) 



to write, for all k £ N, 



Mi 'F c \ - 



t j \ n ) \ t 

2EF( S -^]-EF(m=M$F 



k_ 

2ct ) V ct I "~ ct ' \ct 



(82) 
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thus proving (80). For a general x > 0, we use (8) and (80), to see that 
Ml 2] F c (x) = (tx - [tx])Ml 2] F c (J^Ltl) + (N + 1 - tx)Ml 2] F c (M\ 



( te - N )Ml^(^l) + (N + l- te )^(M) = M^F^- 



the last inequality being trivial as = (ct)(x/c). This concludes the proof of (78). Finally, 
(79) follows easily from (78) and (81), as we have 

sup|M t [2I F c (x) - F c {x)\ = sxrp\M% ] F(x/c) - F(x/c)\. 

This concludes the proof of Proposition 5.1. □ 

As an application of the results in the previous section, we will consider the class of 
(possibly infinite) mixtures of Erlang distributions recently studied by Willmot and Woo 
(see [19]). More specifically, let F( a ^\,a > 0,j e N*, be the distribution function corre- 
sponding to the density f( a ,j) given in (3) (an Erlang j distribution with scale parameter 
a). We will consider a finite number of scale parameters arranged in increasing order 
(0 < ai < ■ • • < o n ) and a set of non-negative numbers Pij,i = 1, . . . ,n, j = 0, 1, 2, . . . , such 
that Yh=i Sjli Pij ' = P — 1) an d define the class of distribution functions M.E(ax, . . . , a n ) 
given as 

n oo 

F(x) = (l-p)+535^p ii f ai j(x), x>0 (83) 

t=l 3=1 

(we consider a slight modification of the class in [19], page 103, as we allow the point 
mass at with probability 1 —p). Based on [19], page 103, we can alternatively write 
(83) by using only the maximum of the scale parameters, that is, 

oo 

F(x) = (l-p)+J2Pj F *n,j(x), a;>0. (84) 

3=1 

Moreover, the class (84) is a wide class containing many of the distributions considered in 
applied probability, such as (obviously) finite mixtures of Erlang distributions, but also 
the class of phase-type distributions (see Proposition 5.3 below). Every random variable 
having a representation as in (83) can be approximated by means of M t , as shown in 
the following result. 

Proposition 5.2. Let F be a distribution function of the form M£(ai, . . . , a n ), < a\ < 

\'2] 

■ ■ ■ < a n , as in (83). Let M\ ,t > 0, be defined as in (8). We have 



pi F _ F |l<('i! + ^)?k(^«M. (85) 
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Proof. Let t > and < a\ < • • • < a n be fixed. The linearity of Ml J yields 

n oo 

Ml 2] F(x) = (l-p)+J2J2Pv M t 2]F ^i(x), *>0. (86) 

»=1 j=l 



By Corollary 4.1, we can write, for a scale parameter 1 

17 27 \ 1 

12 + 16eyt2 



Mf ] i^-i^<fS + ^U, i = 2,3,.... (87) 



Moreover, using Lemma 4.1, we have 

Now, let the general scale parameters be Oj, i = 1, . . . ,n. We use the fact that given X, 
a gamma random variable of scale parameter 1, X/a,i is a gamma random variable of 
scale parameter a 2 ;, and, therefore, using Proposition 5.1, (87) and (88), we have for each 
a; , i — 1 , . . . , n, and j € N* , 



\\M^F ai , F ai J\ = WM^Frj ftj < + ^ j ^. (89) 
Thus, using (86) and (89), we have 

n oo 

iiA/ t [2] F-F|| <£E^N 2I ^-^-ii 



17 27 \ a 



< 



i=i j=i 

17 , 27\Er=i(E^ftiK ? 



(90) 



12 16e/ i 2 

This completes the proof of Proposition 5.2. □ 

As a consequence of the previous result, we can provide error bounds for compound 
distributions (that is, distribution functions of random sums, as in (14)) when the sum- 
mands are mixtures of Erlang distributions, as stated in the following result. 

Corollary 5.1. Let G be the distribution junction of a random sum, as in (14), in which 
the sequence of (Xi)i e pj* has a common distribution Ai£(ai, . . . , a n ), < a\ < ■ ■ ■ < a n , 

\'2] 

as defined in (83). Let All be as in (8). We have that 



\M t 



12] r ru< (17 27\ (1-G(0))a 2 r , 
1 12 + 16c"i 
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Proof. The proof is immediate, taking into account that a mixture of Erlang distri- 
butions A4£(ai, . . . , a n ), < a± < ■ ■ ■ < a n , can be expressed as in (84) and compound 
distributions of these random variables are also mixtures of Erlang distributions (see [19], 
page 106, with a slight modification in the coefficients, as we allow a point mass at 0), 
that is, we can write 

oo 

G(x) = q + qjF an!j (x), x>0, 
j'=i 

in which {qj,j = 0, 1, . . .} form a probability mass function (obviously, go = G(0)). The 
result follows using the above expression and Proposition 5.2. □ 

The class of phase-type distributions, of great importance in applied probability, can be 
expressed as mixtures of Erlang distributions. A phase-type distribution is defined as the 
time until absorption in a continuous-time Markov chain with one absorbent state (see, 
for instance, [12], Chapter II or [8], Chapter VIII, and the references therein). A phase- 
type distribution can be expressed in terms of a matrix exponential as follows. Consider 
a vector a = (a.\, . . . , a n ) of non-negative numbers such that a% + • • ■ + a n < 1. Let A 
be a n x n matrix with negative diagonal entries, non-negative off-diagonal entries and 
non-positive row sums. A non-negative random variable A is a phase-type distribution 
PH(a,A) if its distribution function can be written as 

F(x) = l-ae xA l', x>0, 

in which 1' represents the transpose of the nth dimensional vector 1 = (1,...,1). 
Note that phase-type distributions are absolutely continuous random variables when 
ai + ■ • • + OL n = 1, having positive mass at (of magnitude 1 — (ai + ■ ■ ■ + a n )) when 
a± H h Oin < 1- Phase-type distributions have been extensively studied from both the- 
oretical and practical points of view. For instance, it is well known that phase-type dis- 
tributions have rational Laplace transforms, thus allowing numerical computation using 
our approximation procedures. Also, in the next proposition, we will give an expression 
of phase-type distributions in terms of mixtures of Erlang distributions. This, together 
with Proposition 5.2, provides our approximations with rates of convergence. The proof 
of the next result is based on the following property of phase-type distributions, due to 
Maier (see [13], page 591). Let / be the density of an absolutely continuous phase-type 
distribution. There exists some c > verifying 

> 0, j € N. (91) 

x=0 

We are now in a position to state the following. 

Proposition 5.3. Let F be a phase-type distribution PH(a,A), with a>i + • • • + a n > 0. 

Let c > be such that the absolutely continuous part of F satisfies the property (91). 
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Then, F can be expressed as a mixture of Erlang distributions, that is, 

oo 

F(x)=p + Y2Pj F cAx), z>0, (92) 
i=i 

in which po = 1 — (ai + ■ ■ ■ + a n ). 

Proof. To prove (a), assume first that F is absolutely continuous, that is, that a\ H h 

a n = 1. Its density is then given by /(x) = — ae x Al', x > 0. We choose a c > verifying 
(91). Note that we can write 

c cx f(x) = ~ac x{cI - A) Al', x>0. (93) 

It can be easily checked that the function — ac x<yCl ~ A ^ Al' , x € M is analytic in M, so if we 
consider the Taylor series expansion of this function around and take into account (91) 
and (93), we have 

from which we can write (recall (3)) 

p 1 \ \ - Cj c J ^ 1 x J e cx cj 

^( a; ) = E^T -j =E^TT^+i(- T )' x>0 > 

j=0 J ' j=0 

and, in this way, we obtain the expression of / in terms of a mixture of Erlang densities 
with shape parameter c (by construction, the coefficients are non-negative and integrating 
both sides in the above expression, we see that their sum is 1). As a consequence, we can 
write 

OO 

thus having expressed F as a mixture of Erlang distributions, as in (92). Now, assume 
that < ai + ■ ■ • + O-n < 1- This means that F has a point mass at of magnitude 
po := 1 — (qi + • • • + a„). The absolutely continuous part of F (F ac ) is a phase-type 
distribution (PH(a, A)) with a = (a\ + ■ ■ ■ + a n )^ 1 a. Let c > be such that F ac verifies 
property (93). We can write, thanks to (94), 

oo 

F(x) =p + (l- Po )F ac (x) = Po +J2{l-Po)^ L F Cij (x), x>0. 

3 = 1 ° 



This completes the proof of Proposition 5.3. 



□ 
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Remark 5.1. Expansions similar to those given in Proposition 5.3 can be found in 
[12], page 58. These expansions are obtained using a representation PH(a 7 A) of the 
distribution under consideration. Note that if we denote by ||A|| the maximum absolute 
value of the entries of A, then it is easy to check using (93) (see [14], page 751) that 
c= \\A\\ verifies (91). However, as the representation of a phase-type distribution is not 
unique, this value might not be the optimum one. Also, observe that the error bound 
given in (85) indicates that we should take c to be as small as possible. This problem, 
then, is closely connected to Conjecture 6 in [14], concerning the minimum c satisfying 
(91) and its relation with a phase-type representation having \\A\\ as small as possible. 
To the best of our knowledge, this conjecture remains unsolved. 
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